Workplace safety in hazardous environments like construction sites and industrial plants is crucial to prevent accidents and injuries. One of the most important safety measures is ensuring workers wear safety helmets, which protect against head injuries from falling objects and machinery. Non-compliance with helmet regulations increases the risk of serious injuries or fatalities, making effective monitoring essential, especially in large-scale operations where manual oversight is prone to errors and inefficiency.
To overcome these challenges, SafeGuard Corp plans to develop an automated image analysis system capable of detecting whether workers are wearing safety helmets. This system will improve safety enforcement, ensuring compliance and reducing the risk of head injuries. By automating helmet monitoring, SafeGuard aims to enhance efficiency, scalability, and accuracy, ultimately fostering a safer work environment while minimizing human error in safety oversight.
As a data scientist at SafeGuard Corp, you are tasked with developing an image classification model that classifies images into one of two categories:
The dataset consists of 631 images, equally divided into two categories:
Dataset Characteristics:
!pip install tensorflow[and-cuda] numpy==1.25.2 -q
import tensorflow as tf
print("Num GPUs Available:", len(tf.config.list_physical_devices('GPU')))
print(tf.__version__)
Note:
After running the above cell, kindly restart the notebook kernel (for Jupyter Notebook) or runtime (for Google Colab) and run all cells sequentially from the next cell.
On executing the above line of code, you might see a warning regarding package dependencies. This error message can be ignored as the above code ensures that all necessary libraries and their dependencies are maintained to successfully execute the code in this notebook.
import os
import random
import numpy as np # Importing numpy for Matrix Operations
import pandas as pd
import seaborn as sns
import matplotlib.image as mpimg # Importing pandas to read CSV files
import matplotlib.pyplot as plt # Importting matplotlib for Plotting and visualizing images
import math # Importing math module to perform mathematical operations
import cv2
# Tensorflow modules
import keras
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator # Importing the ImageDataGenerator for data augmentation
from tensorflow.keras.models import Sequential # Importing the sequential module to define a sequential model
from tensorflow.keras.layers import Dense,Dropout,Flatten,Conv2D,MaxPooling2D,BatchNormalization # Defining all the layers to build our CNN Model
from tensorflow.keras.optimizers import Adam,SGD # Importing the optimizers which can be used in our model
from sklearn import preprocessing # Importing the preprocessing module to preprocess the data
from sklearn.model_selection import train_test_split # Importing train_test_split function to split the data into train and test
from sklearn.metrics import confusion_matrix
from tensorflow.keras.models import Model
from keras.applications.vgg16 import VGG16 # Importing confusion_matrix to plot the confusion matrix
from sklearn.preprocessing import LabelEncoder
from PIL import Image
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.applications import VGG16
from tensorflow.keras import layers, models
from tensorflow.keras.callbacks import EarlyStopping
# Display images using OpenCV
from google.colab.patches import cv2_imshow
#Imports functions for evaluating the performance of machine learning models
from sklearn.metrics import confusion_matrix, f1_score, accuracy_score, recall_score, precision_score, classification_report
from sklearn.metrics import mean_squared_error as mse # Importing cv2_imshow from google.patches to display images
# Ignore warnings
import warnings
warnings.filterwarnings('ignore')
# Set the seed using keras.utils.set_random_seed. This will set:
# 1) `numpy` seed
# 2) backend random seed
# 3) `python` random seed
tf.keras.utils.set_random_seed(812)
from google.colab import files
uploaded = files.upload()
# Step1: Data overview
images_path = "images_proj.npy"
labels_path = "Labels_proj.csv"
images = np.load(images_path)
labels_df = pd.read_csv(labels_path)
print("images.shape:", images.shape)
print("labels.shape:", labels_df.shape)
print(labels_df.head())
# assume labels_df has column 'Label' — if different, adapt accordingly
labels = labels_df['Label'].astype(str).values
from collections import Counter
print("Class counts:", Counter(labels))
This means:
The images are consistent in size and format, which simplifies preprocessing and model training.
Labels have shape (631, 1), indicating:
No label–image mismatch is present.
Class counts:
The dataset is nearly balanced, with only a small difference of 9 images between the two classes. This means no major action is required for class balancing.
The dataset is clean and ready for EDA
# Step2: EDA - show random images per class and class balance
le = LabelEncoder()
y = le.fit_transform(labels)
class_names = list(le.classes_)
print("Class names:", class_names)
def show_random_for_class(images, y, class_index, n=6):
idxs = np.where(y==class_index)[0]
sel = np.random.choice(idxs, size=min(len(idxs), n), replace=False)
plt.figure(figsize=(15,3))
for i, idx in enumerate(sel):
plt.subplot(1, n, i+1)
img = images[idx]
if img.ndim==2:
plt.imshow(img, cmap='gray'); plt.axis('off')
else:
# if floats, clip
plt.imshow(img.astype('uint8'))
plt.axis('off')
plt.title(class_names[class_index])
plt.show()
show_random_for_class(images, y, 0, n=6)
show_random_for_class(images, y, 1, n=6)
# class balance
import pandas as pd
print(pd.Series(labels).value_counts())
The dataset contains two classes:
These correctly represent the binary classification task required for helmet detection.
Class counts:
The dataset is nearly perfectly balanced. No oversampling or undersampling is necessary.
Twelve sample images were displayed (six from each class). Key observations:
This suggests:
This does not affect the ability to train a CNN, but it explains the unusual color tones.
Despite unusual coloration, both classes show good variability in background, lighting, posture, and scene type — useful for building a robust classifier.
The dataset is clean, diverse, and suitable for model development.
type(labels)
# Labels is a numpy array, flatten it and count values
class_counts = np.unique(labels, return_counts=True)
print("Class distribution:")
for cls, count in zip(class_counts[0], class_counts[1]):
print(f"Class {cls}: {count}")
# Plot
plt.figure(figsize=(6,4))
sns.barplot(x=class_counts[0].astype(str), y=class_counts[1])
plt.title("Class Distribution")
plt.xlabel("Class")
plt.ylabel("Number of Images")
plt.show()
The dataset contains:
This shows that the dataset is very well balanced, with only a small difference of 9 images between the two classes.
Because the imbalance is minimal, no additional techniques such as oversampling, undersampling, or class weighting are required. The model can be trained directly without bias toward either class.
# Show first 5 (HEAD)
print("HEAD (first 5 images):")
for i in range(5):
plt.imshow(images[i])
plt.title(f"Label: {labels[i]}")
plt.show()
# Show last 5 (TAIL)
print("TAIL (last 5 images):")
for i in range(1, 6):
plt.imshow(images[-i])
plt.title(f"Label: {labels[-i]}")
plt.show()
Preview the Data (Head / Tail):
images[i] for i = 0 → 4 → first 5 → head
images[-i] for i = 1 → 5 → last 5 → tail
print("Min pixel value:", images.min())
print("Max pixel value:", images.max())
print("Mean pixel value:", images.mean())
print("Std pixel value:", images.std())
The dataset has normalized 8-bit pixel values with good brightness balance and strong contrast, suggesting the images are well-distributed and suitable for training after appropriate preprocessing (e.g., normalization to 0–1 or standardization).
plt.hist(images.ravel(), bins=50)
plt.title("Pixel Intensity Distribution")
plt.show()
Range of Intensities in pixel intensity distribution histogram
Peaks at Extremes
Midrange Distribution
Skewness
Summary:
The image dataset has a large number of saturated pixels (both black and white), with smaller spikes across midrange intensities. This indicates strong contrast and possibly repeated intensity patterns in the images, which may affect preprocessing or model learning.
shapes = [img.shape for img in images]
set(shapes)
brightness = images.mean(axis=(1,2,3))
plt.hist(brightness, bins=30)
plt.title("Brightness Distribution")
plt.show()
Intensity Range in brightness distribution histogram
Shape
Skewness and Symmetry
Implication for Images
Summary:
The dataset’s images have mostly mid-range brightness, forming a near-normal distribution. This suggests good contrast and balanced lighting, which should help the CNN learn features effectively without being biased by extreme dark or bright pixels.
contrast = images.std(axis=(1,2,3))
plt.hist(contrast, bins=30)
plt.title("Contrast Distribution")
plt.show()
red_mean = images[:,:,:,0].mean()
green_mean = images[:,:,:,1].mean()
blue_mean = images[:,:,:,2].mean()
print(red_mean, green_mean, blue_mean)
This information is useful for:
The dataset’s color distribution leans toward the blue channel, with all channels showing moderate and balanced intensity ranges, indicating no severe color imbalance.
# pick one example image
i = 0
img = images[i] # shape (H, W, 3) but unknown order
# Plot raw image (as loaded)
plt.figure(figsize=(5,5))
plt.imshow(img) # matplotlib expects RGB
plt.title("Raw image as loaded")
plt.axis("off")
plt.show()
# Plot channels separately
fig, ax = plt.subplots(1, 3, figsize=(12,4))
ax[0].imshow(img[:, :, 0], cmap="gray")
ax[0].set_title("Channel 0")
ax[1].imshow(img[:, :, 1], cmap="gray")
ax[1].set_title("Channel 1")
ax[2].imshow(img[:, :, 2], cmap="gray")
ax[2].set_title("Channel 2")
for a in ax:
a.axis("off")
plt.show()
All channels are grayscale but Channel 0 has visibly more contrast / more intense structure than Channels 1 and 2.
In normal RGB images of humans, we would expect:
Red channel (R) → strongest (skin tones, helmets, warm colors)
Green (G) → medium intensity
Blue (B) → weakest
But our Channel 0 is the strongest, and Channel 2 is weaker.
which is the BGR order, and we need to Convert BGR → RGB before any preprocessing.
images_rgb = images[:, :, :, ::-1]
i = 0 # pick any image
plt.figure(figsize=(10,5))
# BEFORE (BGR interpreted wrong as RGB)
plt.subplot(1,2,1)
plt.imshow(images[i]) # raw image (incorrect colors)
plt.title("Before Conversion (BGR interpreted as RGB)")
plt.axis("off")
# AFTER (RGB correct)
plt.subplot(1,2,2)
plt.imshow(images_rgb[i]) # correct image
plt.title("After Conversion (Correct RGB)")
plt.axis("off")
plt.show()
print("Mean per channel BEFORE:", np.mean(images, axis=(0,1,2)))
print("Mean per channel AFTER:", np.mean(images_rgb, axis=(0,1,2)))
This shows that it was successfully converted from BGR to RGB.
The variable images_rgb is the new array containing the updated RGB images, so we will overwrite the original images to prevent any accidental use of the wrong BGR version.
# Convert BGR → RGB (overwrite original)
images = images[:, :, :, ::-1]
print("Images converted to RGB. New shape:", images.shape)
# Show first 5 (HEAD)
print("HEAD (first 5 images):")
for i in range(5):
plt.imshow(images[i])
plt.title(f"Label: {labels[i]}")
plt.show()
# Show last 5 (TAIL)
print("TAIL (last 5 images):")
for i in range(1, 6):
plt.imshow(images[-i])
plt.title(f"Label: {labels[-i]}")
plt.show()
The new shape (631, 200, 200, 3) indicates:
Also, it has been overwritten with the corrected RGB values (it was BGR previously), everything works perfectly. The first and last 5 images were checked to confirm this.
# PARAMETERS
SMALL_SIZE = (128, 128) # baseline CNN size
VGG_SIZE = (224, 224) # VGG16 input size
def resize_and_convert(images, size=(128,128), to_gray=False, dtype=np.float32):
out = []
for img in images:
# if float -> scale to 0-255 for PIL
if img.dtype != np.uint8:
im = Image.fromarray((img).astype('uint8'))
else:
im = Image.fromarray(img)
imr = im.resize(size)
if to_gray:
imr = imr.convert('L') # grayscale
arr = np.array(imr).astype(dtype)
arr = arr[..., np.newaxis] # keep channel axis
else:
imr = imr.convert('RGB')
arr = np.array(imr).astype(dtype)
out.append(arr)
return np.array(out)
# create grayscale dataset for baseline CNN
X_gray = resize_and_convert(images, size=SMALL_SIZE, to_gray=True)
X_gray = X_gray / 255.0 # normalize
y_cat = to_categorical(y, num_classes=2)
# Stratified split (train 60%, val 20%, test 20%)
X_temp, X_test, y_temp, y_test = train_test_split(X_gray, y_cat, test_size=0.2, stratify=y, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X_temp, y_temp, test_size=0.25, stratify=np.argmax(y_temp,axis=1), random_state=42)
print("Shapes (grayscale 128x128x1):", X_train.shape, X_val.shape, X_test.shape)
# show a before/after for a sample
import matplotlib.pyplot as plt
plt.figure(figsize=(8,4))
plt.subplot(1,2,1)
plt.title("Original sample (resized RGB)")
plt.imshow(Image.fromarray(images[0]).resize(SMALL_SIZE))
plt.axis('off')
plt.subplot(1,2,2)
plt.title("Grayscale 128x128")
plt.imshow(X_gray[0].squeeze(), cmap='gray')
plt.axis('off')
plt.show()
The preprocessing steps were successfully completed, and the results confirm that the data is prepared correctly for model training:
The sample visualization shows:
This confirms that the grayscale conversion was executed correctly and the channel dimension has been preserved as expected for CNN input.
The dataset was divided into stratified train/validation/test sets with the following shapes:
Shapes (grayscale 128x128x1):
(378, 128, 128, 1) → Training set
(126, 128, 128, 1) → Validation set
(127, 128, 128, 1) → Test set
These sizes correspond to a 60% / 20% / 20% split, meaning the dataset split worked as intended. Stratification ensures that the class balance is preserved across all splits.
All grayscale images have been scaled to the range 0–1 using X_gray / 255.0, which is appropriate for neural network training and improves training stability.
All three preprocessing goals have been met:
The resulting dataset is correctly preprocessed and ready for model development.
def model_performance_classification(model, predictors, target):
"""
Function to compute different metrics to check classification model performance
"""
# use probability of class 1 only
pred = (model.predict(predictors)[:, 1] > 0.5).astype(int)
target = target.to_numpy().reshape(-1)
acc = accuracy_score(target, pred)
recall = recall_score(target, pred, average='weighted')
precision = precision_score(target, pred, average='weighted')
f1 = f1_score(target, pred, average='weighted')
df_perf = pd.DataFrame({
"Accuracy": acc,
"Recall": recall,
"Precision": precision,
"F1 Score": f1
}, index=[0])
return df_perf
def plot_confusion_matrix(model,predictors,target,ml=False):
pred = (model.predict(predictors)[:, 1] > 0.5).astype(int)
target = target.to_numpy().reshape(-1)
confusion_matrix = tf.math.confusion_matrix(target,pred)
f, ax = plt.subplots(figsize=(10, 8))
sns.heatmap(
confusion_matrix,
annot=True,
linewidths=.4,
fmt="d",
square=True,
ax=ax
)
plt.show()
# ======================================================
# MODEL 1: SIMPLE BASELINE CNN
# ======================================================
input_shape = (128, 128, 1)
model_cnn = models.Sequential([
layers.Conv2D(32, (3,3), activation='relu', padding='same', input_shape=input_shape),
layers.MaxPooling2D((2,2)),
layers.Conv2D(64, (3,3), activation='relu', padding='same'),
layers.MaxPooling2D((2,2)),
layers.Conv2D(128, (3,3), activation='relu', padding='same'),
layers.MaxPooling2D((2,2)),
layers.Flatten(),
layers.Dense(128, activation='relu'),
layers.Dropout(0.3),
layers.Dense(2, activation='softmax') # 2 classes → softmax
])
model_cnn.compile(
optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy']
)
model_cnn.summary()
# Early stopping to prevent overfitting
es = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)
history_cnn = model_cnn.fit(
X_train, y_train,
epochs=25,
batch_size=32,
validation_data=(X_val, y_val),
callbacks=[es],
verbose=1
)
y_test_labels = pd.Series(np.argmax(y_test, axis=1))
# compute performance
performance_cnn = model_performance_classification(model_cnn, X_test, y_test_labels)
performance_cnn
performance_cnn = model_performance_classification(model_cnn, X_test, y_test_labels)
performance_cnn
plot_confusion_matrix(model_cnn, X_test, y_test_labels)
# =========================================
# VISUALIZING PREDICTIONS
# =========================================
class_labels = ['Class 0', 'Class 1'] # EDIT IF YOUR LABELS HAVE NAMES
def show_predictions(model, images, true_labels, n=6):
plt.figure(figsize=(14, 8))
idxs = np.random.choice(len(images), n, replace=False)
for i, idx in enumerate(idxs):
img = images[idx]
true_label = true_labels[idx]
pred_prob = model.predict(img[np.newaxis, ...])[0]
pred_label = np.argmax(pred_prob)
plt.subplot(2, 3, i+1)
plt.imshow(img.squeeze(), cmap='gray')
plt.title(f"Pred: {class_labels[pred_label]}\nTrue: {class_labels[true_label]}")
plt.axis('off')
plt.tight_layout()
plt.show()
# run the visualization
show_predictions(model_cnn, X_test, np.argmax(y_test, axis=1))
Confusion matrix:
# ======================================================
# PREPARE VGG16 DATA (224×224×3)
# ======================================================
X_vgg = resize_and_convert(images_rgb, size=VGG_SIZE, to_gray=False)
X_vgg = X_vgg / 255.0 # normalize to 0-1
y_cat = to_categorical(y, num_classes=2)
# stratified split (same as before)
X_temp_vgg, X_test_vgg, y_temp_vgg, y_test_vgg = train_test_split(
X_vgg, y_cat, test_size=0.2, stratify=y, random_state=42
)
X_train_vgg, X_val_vgg, y_train_vgg, y_val_vgg = train_test_split(
X_temp_vgg, y_temp_vgg, test_size=0.25,
stratify=np.argmax(y_temp_vgg, axis=1), random_state=42
)
print("VGG16 Shapes:", X_train_vgg.shape, X_val_vgg.shape, X_test_vgg.shape)
# ======================================================
# MODEL 2: VGG16 BASE MODEL
# ======================================================
# Load pretrained VGG16 without top layers
vgg_base = VGG16(
weights='imagenet',
include_top=False,
input_shape=(224, 224, 3)
)
# Freeze all convolutional layers
for layer in vgg_base.layers:
layer.trainable = False
# Build model
model_vgg = models.Sequential([
vgg_base,
layers.Flatten(),
layers.Dense(256, activation='relu'),
layers.Dropout(0.4),
layers.Dense(2, activation='softmax') # 2 classes
])
model_vgg.compile(
optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy']
)
model_vgg.summary()
# Training the VGG-16 Model
es2 = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)
history_vgg = model_vgg.fit(
X_train_vgg, y_train_vgg,
epochs=20,
batch_size=32,
validation_data=(X_val_vgg, y_val_vgg),
callbacks=[es2],
verbose=1
)
# Evaluate VGG-16 on Test Data
y_test_vgg_labels = pd.Series(np.argmax(y_test_vgg, axis=1))
performance_vgg = model_performance_classification(model_vgg, X_test_vgg, y_test_vgg_labels)
performance_vgg
# Confusion Matrix
plot_confusion_matrix(model_vgg, X_test_vgg, y_test_vgg_labels)
# ======================================================
# VISUALIZING VGG16 PREDICTIONS
# ======================================================
class_labels = ['Class 0', 'Class 1'] # edit if needed
def show_predictions_vgg(model, images, true_labels, n=6):
plt.figure(figsize=(14, 8))
idxs = np.random.choice(len(images), n, replace=False)
for i, idx in enumerate(idxs):
img = images[idx]
true_label = true_labels[idx]
pred_prob = model.predict(img[np.newaxis, ...])[0]
pred_label = np.argmax(pred_prob)
plt.subplot(2, 3, i+1)
plt.imshow(img) # RGB image
plt.title(f"Pred: {class_labels[pred_label]}\nTrue: {class_labels[true_label]}")
plt.axis('off')
plt.tight_layout()
plt.show()
# run visualization
show_predictions_vgg(model_vgg, X_test_vgg, np.argmax(y_test_vgg, axis=1))
Confusion matrix:
Strengths:
Limitations:
# ======================================================
# MODEL 3: VGG16 + FFNN
# ======================================================
from tensorflow.keras.applications import VGG16
from tensorflow.keras import layers, models
# Load pretrained VGG16 without the top layers
vgg_base = VGG16(
weights="imagenet",
include_top=False,
input_shape=(224, 224, 3)
)
# Freeze convolutional layers
for layer in vgg_base.layers:
layer.trainable = False
# FFNN classifier block (deeper than Model 2)
ffnn_model = models.Sequential([
vgg_base,
layers.Flatten(),
layers.Dense(512, activation='relu'),
layers.Dropout(0.5),
layers.Dense(256, activation='relu'),
layers.Dropout(0.4),
layers.Dense(128, activation='relu'),
layers.Dropout(0.3),
layers.Dense(2, activation='softmax') # two classes
])
ffnn_model.compile(
optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy']
)
ffnn_model.summary()
# Training — VGG16 + FFNN Model
es3 = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)
history_ffnn = ffnn_model.fit(
X_train_vgg, y_train_vgg,
epochs=25,
batch_size=32,
validation_data=(X_val_vgg, y_val_vgg),
callbacks=[es3],
verbose=1
)
# Evaluation Using Your Utility Functions
y_test_ffnn_labels = pd.Series(np.argmax(y_test_vgg, axis=1))
performance_ffnn = model_performance_classification(
ffnn_model,
X_test_vgg,
y_test_ffnn_labels
)
performance_ffnn
# Confusion Matrix
plot_confusion_matrix(ffnn_model, X_test_vgg, y_test_ffnn_labels)
# ======================================================
# VISUALIZING PREDICTIONS — VGG16 + FFNN
# ======================================================
class_labels = ["Class 0", "Class 1"] # replace names if needed (Helmet / No Helmet)
def show_predictions_ffnn(model, images, true_labels, n=6):
plt.figure(figsize=(14, 8))
idxs = np.random.choice(len(images), n, replace=False)
for i, idx in enumerate(idxs):
img = images[idx]
true_label = true_labels[idx]
pred = model.predict(img[np.newaxis, ...])[0]
pred_label = np.argmax(pred)
plt.subplot(2, 3, i+1)
plt.imshow(img) # RGB image
plt.title(f"Pred: {class_labels[pred_label]}\nTrue: {class_labels[true_label]}")
plt.axis('off')
plt.tight_layout()
plt.show()
# Run visualization
show_predictions_ffnn(ffnn_model, X_test_vgg, np.argmax(y_test_vgg, axis=1))
Strengths:
Limitations:
CNNs have the property of translational invariance, which means they can recognise an object even if its appearance shifts translationally in some way. - Taking this attribute into account, we can augment the images using the techniques listed below
Remember, data augmentation should not be used in the validation/test data set.
# ======================================================
# DATA AUGMENTATION (Training Only)
# ======================================================
train_aug = ImageDataGenerator(
horizontal_flip=True,
vertical_flip=False, # usually avoid vertical flips for humans
width_shift_range=0.15,
height_shift_range=0.15,
rotation_range=20,
shear_range=0.10,
zoom_range=0.15
)
# Create generator for augmented images
train_gen = train_aug.flow(
X_train_vgg,
y_train_vgg,
batch_size=32,
shuffle=True
)
# Validation generator (NO augmentation)
val_gen = ImageDataGenerator().flow(
X_val_vgg,
y_val_vgg,
batch_size=32
)
# ======================================================
# MODEL 4: VGG16 + FFNN + DATA AUGMENTATION
# ======================================================
vgg_base_aug = VGG16(
weights="imagenet",
include_top=False,
input_shape=(224, 224, 3)
)
# freeze convolution layers
for layer in vgg_base_aug.layers:
layer.trainable = False
aug_model = models.Sequential([
vgg_base_aug,
layers.Flatten(),
layers.Dense(512, activation='relu'),
layers.Dropout(0.5),
layers.Dense(256, activation='relu'),
layers.Dropout(0.4),
layers.Dense(128, activation='relu'),
layers.Dropout(0.3),
layers.Dense(2, activation='softmax')
])
aug_model.compile(
optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy']
)
aug_model.summary()
# Training / Early stopping
es4 = EarlyStopping(monitor='val_loss', patience=7, restore_best_weights=True)
history_aug = aug_model.fit(
train_gen,
epochs=30,
validation_data=val_gen,
callbacks=[es4],
verbose=1
)
# Evaluation on the Test Set
y_test_aug_labels = pd.Series(np.argmax(y_test_vgg, axis=1))
performance_aug = model_performance_classification(
aug_model,
X_test_vgg,
y_test_aug_labels
)
performance_aug
# Confusion Matrix
plot_confusion_matrix(aug_model, X_test_vgg, y_test_aug_labels)
# ======================================================
# VISUALIZING PREDICTIONS — MODEL 4
# ======================================================
class_labels = ["Class 0", "Class 1"] # Replace with Helmet / No Helmet if needed
def show_predictions_aug(model, images, true_labels, n=6):
plt.figure(figsize=(14, 8))
idxs = np.random.choice(len(images), n, replace=False)
for i, idx in enumerate(idxs):
img = images[idx]
true_label = true_labels[idx]
pred = model.predict(img[np.newaxis, ...])[0]
pred_label = np.argmax(pred)
plt.subplot(2, 3, i+1)
plt.imshow(img)
plt.title(f"Pred: {class_labels[pred_label]}\nTrue: {class_labels[true_label]}")
plt.axis('off')
plt.tight_layout()
plt.show()
# Run visualization
show_predictions_aug(aug_model, X_test_vgg, np.argmax(y_test_vgg, axis=1))
Here’s an observation for this output:
Observation:
The VGG16 + FFNN model with data augmentation shows exceptional performance on the dataset. The architecture leverages the pre-trained VGG16 as a feature extractor (with 14.7M non-trainable parameters) and adds a fully connected network with three dense layers and dropout for regularization.
During training, the model quickly converges: by epoch 2, both training and validation accuracy exceed 97%, and by epoch 6–7, the training accuracy reaches nearly 100% with validation accuracy consistently at 99–100%. The very low validation loss indicates that the model generalizes extremely well to the validation set.
The final evaluation metrics confirm perfect classification: accuracy, recall, precision, and F1-score are all 1.0. The confusion matrix shows no misclassifications across 127 samples, demonstrating that the model correctly distinguishes both classes with zero errors.
Overall, this combination of transfer learning, dense layers, dropout, and data augmentation appears highly effective for the dataset at hand. One cautionary note is that such perfect results may indicate a small or relatively simple dataset, so further testing on unseen or more diverse data would be advisable to ensure robust generalization.
# ----------------------------------------------------
# Prepare 224×224 RGB versions for VGG-based models
# ----------------------------------------------------
def to_rgb_and_resize(x):
# 1) repeat grayscale → RGB
x_rgb = np.repeat(x, 3, axis=-1)
# 2) resize to 224×224 for VGG
x_resized = tf.image.resize(x_rgb, (224, 224)).numpy()
return x_resized
X_val_rgb = to_rgb_and_resize(X_val)
X_test_rgb = to_rgb_and_resize(X_test) # you will need this later for test section
# ----------------------------------------------------
# Model Performance Comparison and Final Model Selection
# ----------------------------------------------------
def evaluate_model(model, X_val, y_val):
preds = model.predict(X_val)
preds = np.argmax(preds, axis=1)
y_true = np.argmax(y_val, axis=1)
acc = accuracy_score(y_true, preds)
prec = precision_score(y_true, preds)
rec = recall_score(y_true, preds)
f1 = f1_score(y_true, preds)
return acc, prec, rec, f1
# CNN uses grayscale 128×128×1
cnn_metrics = evaluate_model(model_cnn, X_val, y_val)
# VGG-based models use 224×224×3
vgg_metrics = evaluate_model(model_vgg, X_val_rgb, y_val)
vggf_metrics = evaluate_model(ffnn_model, X_val_rgb, y_val)
aug_metrics = evaluate_model(aug_model, X_val_rgb, y_val)
comparison_df = pd.DataFrame({
"Model": ["Simple CNN", "VGG16 (Base)", "VGG16 + FFNN", "VGG-16 + FFNN + Data Augmentation"],
"Accuracy": [cnn_metrics[0], vgg_metrics[0], vggf_metrics[0], aug_metrics[0]],
"Precision": [cnn_metrics[1], vgg_metrics[1], vggf_metrics[1], aug_metrics[1]],
"Recall": [cnn_metrics[2], vgg_metrics[2], vggf_metrics[2], aug_metrics[2]],
"F1 Score": [cnn_metrics[3], vgg_metrics[3], vggf_metrics[3], aug_metrics[3]],
})
print("\n========================")
print("MODEL PERFORMANCE COMPARISON")
print("========================\n")
print(comparison_df)
# ----------------------------------------------------
# Select Best Model Based on Accuracy
# ----------------------------------------------------
best_model_name = comparison_df.loc[comparison_df["Accuracy"].idxmax(), "Model"]
if best_model_name == "Simple CNN":
best_model = model_cnn
elif best_model_name == "VGG16 (Base)":
best_model = model_vgg
elif best_model_name == "VGG16 + FFNN":
best_model = ffnn_model
elif best_model_name == "VGG-16 + FFNN + Data Augmentation":
best_model = aug_model
else:
raise ValueError("Model name not recognized.")
print(f"\nBest model selected: **{best_model_name}**\n")
print("X_val shape:", X_val.shape)
print("X_val_rgb shape:", X_val_rgb.shape)
# ----------------------------------------------------
# Evaluate Best Model on Test Set
# ----------------------------------------------------
print("\n========================")
print("TEST PERFORMANCE OF BEST MODEL")
print("========================\n")
test_preds = best_model.predict(X_test)
test_preds = np.argmax(test_preds, axis=1)
y_true_test = np.argmax(y_test, axis=1)
test_acc = accuracy_score(y_true_test, test_preds)
test_prec = precision_score(y_true_test, test_preds)
test_rec = recall_score(y_true_test, test_preds)
test_f1 = f1_score(y_true_test, test_preds)
print(f"Test Accuracy: {test_acc:.4f}")
print(f"Test Precision: {test_prec:.4f}")
print(f"Test Recall: {test_rec:.4f}")
print(f"Test F1 Score: {test_f1:.4f}")
# Confusion matrix
cm = confusion_matrix(y_true_test, test_preds)
print("\nConfusion Matrix:")
print(cm)
# Visualization Code for Comparison
comparison_df.plot(x="Model", y=["Accuracy", "Precision", "Recall", "F1 Score"],
kind="bar", figsize=(10,5), title="Model Performance Comparison")
plt.xticks(rotation=45)
plt.show()
# ============================================================
# 1. CHECK VALIDATION SET SIZE + CLASS DISTRIBUTION
# ============================================================
print("=== Validation Set Size ===")
print("Total validation images:", len(X_val))
val_class_counts = np.sum(y_val, axis=0)
print("\nImages per class in validation set:")
for idx, count in enumerate(val_class_counts):
print(f"Class {idx}: {count}")
# ============================================================
# 2. CHECK FOR DATA LEAKAGE
# ============================================================
print("\n=== Checking Data Leakage ===")
import hashlib
def hash_images(X):
return {hashlib.md5(img.tobytes()).hexdigest() for img in X}
train_hashes = hash_images(X_train)
val_hashes = hash_images(X_val)
test_hashes = hash_images(X_test)
print("Train ↔ Val overlap:", len(train_hashes & val_hashes))
print("Train ↔ Test overlap:", len(train_hashes & test_hashes))
print("Val ↔ Test overlap:", len(val_hashes & test_hashes))
if len(train_hashes & val_hashes)==0 and len(train_hashes & test_hashes)==0 and len(val_hashes & test_hashes)==0:
print("No leakage detected ✔️")
else:
print("⚠️ WARNING: Possible data leakage detected!")
# ============================================================
# 3. EVALUATE ALL MODELS ON TEST SET **with correct VGG inputs**
# ============================================================
print("\n=== PERFORMANCE ON TEST SET ===")
cnn_test = evaluate_model(model_cnn, X_test, y_test) # shape (128,128,1)
vgg_test = evaluate_model(model_vgg, X_test_vgg, y_test) # shape (224,224,3)
ffnn_test = evaluate_model(ffnn_model, X_test_vgg, y_test) # shape (224,224,3)
aug_test = evaluate_model(aug_model, X_test_vgg, y_test) # shape (224,224,3)
test_comparison_df = pd.DataFrame({
"Model": ["Simple CNN", "VGG16 (Base)", "VGG16 + FFNN", "VGG16 + FFNN + Augmentation"],
"Accuracy": [cnn_test[0], vgg_test[0], ffnn_test[0], aug_test[0]],
"Precision": [cnn_test[1], vgg_test[1], ffnn_test[1], aug_test[1]],
"Recall": [cnn_test[2], vgg_test[2], ffnn_test[2], aug_test[2]],
"F1 Score": [cnn_test[3], vgg_test[3], ffnn_test[3], aug_test[3]],
})
print(test_comparison_df)
# ============================================================
# 4. CONFUSION MATRIX (Simple CNN)
# ============================================================
print("\n=== CONFUSION MATRIX FOR SIMPLE CNN (validation set) ===")
cnn_preds = np.argmax(model_cnn.predict(X_val), axis=1)
y_true_val = np.argmax(y_val, axis=1)
cm = confusion_matrix(y_true_val, cnn_preds)
plt.figure(figsize=(6,5))
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues")
plt.xlabel("Predicted")
plt.ylabel("True")
plt.title("Confusion Matrix - Simple CNN (Validation)")
plt.show()
# ============================================================
# 5. CLASS BALANCE CHECK
# ============================================================
print("\n=== CLASS BALANCE CHECK (TRAIN/VAL/TEST) ===")
train_counts = np.sum(y_train, axis=0)
test_counts = np.sum(y_test, axis=0)
print("Training set:")
for i, c in enumerate(train_counts):
print(f"Class {i}: {c}")
print("\nValidation set:")
for i, c in enumerate(val_class_counts):
print(f"Class {i}: {c}")
print("\nTest set:")
for i, c in enumerate(test_counts):
print(f"Class {i}: {c}")
No Data Leakage
Balanced Classes
High Performance
Practical Reliability
Model selection must always be based on the validation set, since the test set is only for final reporting and cannot be used for choosing a model.
Based on the validation performance, the Simple CNN achieved the highest accuracy and F1-score, slightly outperforming all VGG-based models. The difference is very small, indicating that all models learned the task well, but according to the correct model-selection procedure:
When evaluated on the test set, all the VGG-based models achieved slightly higher scores than the Simple CNN, but this does not affect the model choice because the test set must not be used for model selection.
The Simple CNN is the selected best model, based strictly on the validation set metrics (the correct selection method). The VGG models show excellent generalization, but they are only used to confirm performance, not to determine the best model.
To sum up, the selection of Simple CNN as the best model is trustworthy, accurate, and justified. The validation and test results confirm it can generalize well to unseen data, and there is no evidence of data leakage or class imbalance that would invalidate the results.
All four models—Simple CNN, VGG16, VGG16 + FFNN, and the augmented version—demonstrated exceptionally high predictive performance, with validation and test accuracies above 99%. This indicates that:
Although VGG-based models performed slightly better on the test set, model selection must be based on the validation set, where the Simple CNN achieved the highest metrics. This suggests that a lightweight architecture can outperform more complex models when the dataset is:
The VGG variants reached perfect or near-perfect results on the test set. This shows that:
All checks confirmed that train, validation, and test splits are clean and non-overlapping. This ensures:
Class distributions were nearly equal across train, validation, and test sets, minimizing bias and improving fairness.
Given its superior validation performance and lower computational footprint:
It requires fewer resources, making it ideal for:
This reduces operational expenses while maintaining high accuracy.
The strong generalization abilities of VGG-based models make them valuable for:
Future scaling if:
This builds redundancy and long-term adaptability.
Introduce probability thresholds to determine when:
Example:
This reduces the risk of misclassification in real-world deployment.
To maintain long-term effectiveness, implement:
This ensures HelmNet remains reliable as new data or use cases emerge.
Although current results are strong, performance can degrade in real-world environments with:
Recommend:
This improves resilience and reduces misclassification under challenging conditions.
The augmented VGG model performed equally well, confirming that augmentation helps avoid overfitting. Recommendation:
Over time, data distribution may change:
Set a schedule for:
This ensures long-term model accuracy.
Build a lightweight dashboard to display:
This supports:
Clarify where HelmNet should and should not be used:
This ensures expectations are aligned with technical constraints.
Create a final technical/business documentation package covering:
This supports operational scalability and organizational knowledge retention.
Deploy Simple CNN as the official model for HelmNet due to its best validation results and minimal resource requirements. Maintain the VGG16 + FFNN model as a secondary, high-capacity alternative for future scaling and robustness.
Together, these actions form a stable, scalable, and cost-efficient machine-learning pipeline capable of supporting HelmNet’s long-term operational and strategic objectives.
Power Ahead!